Skip to content

Part 3 - pow: remove pyethash C extension, always use pure Python ethash#973

Open
ping-ke wants to merge 14 commits intoupgrade/py313-baselinefrom
upgrade/ethash
Open

Part 3 - pow: remove pyethash C extension, always use pure Python ethash#973
ping-ke wants to merge 14 commits intoupgrade/py313-baselinefrom
upgrade/ethash

Conversation

@ping-ke
Copy link
Copy Markdown
Contributor

@ping-ke ping-ke commented Mar 15, 2026

Summary

pyethash is a C++ extension that is not compatible with Python 3.13. The pure Python implementation in ethereum.pow.ethash can serve as a replacement; however, directly removing pyethash caused a significant regression in synchronization performance (10–20× slower), making block sync impractically slow.

This PR:

  • Fixes the compatibility issue by removing pyethash
  • Introduces a 4-round optimization pipeline (R1–R4) to recover and exceed the original performance
  • Reduces sync time from 25.24s → 0.86s (~29× improvement vs old Python, faster than pyethash)

Problem

pyethash 0.1.27 crashes with segfault or floating point exception on Python 3.13 when calling hashimoto_light().

Removing pyethash and falling back to the existing pure Python implementation leads to:

  • Heavy overhead in hex encoding/decoding
  • Excessive Python object allocations in hot loops
  • Severe performance degradation in PoW and block synchronization

Root Cause

A bug in src/python/core.c of pyethash:

PyArg_ParseTuple uses "y#" format which writes Py_ssize_t (8 bytes on 64-bit) into int variables (4 bytes), causing stack corruption.

// core.c line 76-77 (pyethash 0.1.27)
int cache_size, header_size;  // BUG: should be Py_ssize_t
if (!PyArg_ParseTuple(args, "k" PY_STRING_FORMAT PY_STRING_FORMAT "K",
    &block_number, &cache_bytes, &cache_size, &header, &header_size, &nonce))

Solution

This PR removes pyethash entirely and replaces it with a progressively optimized Ethash implementation, eliminating Python bottlenecks in the PoW hot path while restoring (and exceeding) the original performance.

Optimization Strategy

To systematically eliminate bottlenecks, we applied four incremental optimization rounds, each targeting a different layer:

  • R1–R2 (Python-level optimizations)
    Remove serialization overhead and reduce Python object allocations

  • R3 (Cython hot loop)
    Move the hottest FNV mixing loop into C

  • R4 (Full C pipeline)
    Eliminate Python overhead entirely in the PoW critical path (including Keccak)

See #976 for more detail

Result

Root Block Sync Time (end-to-end)

Syncing one root block with 144 miniblocks (maximum load)

impl sync time vs pyethash vs old speedup vs R2
pyethash 1.47 s ~17×
old 25.24 s ~17×
R1 12.38 s ~8.4× ~2.0×
R2 8.52 s ~5.8× ~3.0×
R3 1.39 s ~0.95× ~18× ~6×
R4 0.86 s ~0.58× ~29× ~10×
  • Restores performance after removing pyethash
  • Achieves ~29× speedup vs original Python implementation
  • Achieves better performance than pyethash baseline

Test plan

  • Mining and PoW verification still pass in existing tests
  • No ImportError on Python 3.13
  • Benchmarks cover old / R1 / R2 / R3 / R4
  • Sync tested end-to-end with profiling and timing logs

pyethash is a C++ extension that is not compatible with Python 3.13.
The pure Python implementation in ethereum.pow.ethash is sufficient.
Remove the conditional import and always use the Python path, adding
@lru_cache to get_cache_slow for the same performance benefit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ping-ke added 2 commits March 30, 2026 23:22
Add comment explaining why pyethash C++ acceleration was removed
(not supported on Python 3.13) with link to #976
@ping-ke ping-ke changed the base branch from upgrade/py313-baseline to master April 5, 2026 02:37
@ping-ke ping-ke changed the base branch from master to upgrade/py313-baseline April 5, 2026 02:37
@ping-ke ping-ke requested a review from syntrust April 5, 2026 02:38
ping-ke added 7 commits April 9, 2026 15:42
…arrays

ethash_utils.py:
- replace hex-based encode_int/decode_int with struct.pack/unpack for
  serialize_hash and deserialize_hash (~30x faster per call)
- inline ethash_sha3_512/256 to skip intermediate list conversion (~5x faster
  on list input)
- add ethash_sha3_512_np and ethash_sha3_256_np: numpy ndarray variants that
  accept bytes or ndarray and return uint32 ndarray, eliminating tolist()/
  np.array() round-trips in the hot path
- consolidate keccak implementation here; ethash.py no longer duplicates it

ethash.py:
- store cache as 2D numpy uint32 ndarray (shape n x 16) via _get_cache
- use ethash_sha3_512_np/256_np throughout to keep data in ndarray form
- vectorize the 16-element mix update in calc_dataset_item and hashimoto
  inner loop using numpy arithmetic instead of list(map(fnv, ...))
- scalar fnv for cache_index uses plain Python int to avoid numpy scalar overhead

test_ethash.py:
- add TestEthashUtils covering serialize_hash, deserialize_hash, fnv,
  ethash_sha3_512/256 directly against reference implementations

Benchmark: hashimoto_light ~23% faster end-to-end vs pure Python baseline;
serialize_hash/deserialize_hash ~30x faster individually
ethash_utils.py:
- remove struct, _FMT_16I, _FMT_8I (only served deleted serialize/deserialize_hash)
- remove fnv (only used in tests, not in production path)
- remove ethash_sha3_512 list variant (replaced by numpy ndarray variant)
- remove serialize_hash, deserialize_hash, hash_words, xor, serialize_cache,
  deserialize_cache and related aliases (all replaced by ndarray.tobytes/frombuffer)

ethash.py:
- mkcache: use ethash_sha3_256_np(...).tobytes() directly, drop serialize_hash
- drop serialize_hash import (no longer needed)

ethpow.py:
- remove pyethash C-extension dead code paths (get_cache/hashimoto were always
  equal to get_cache_slow/hashimoto_slow after pyethash removal)
- keep get_cache_slow/hashimoto_slow structure as fallback for future Cython ext

test_ethash.py:
- remove test cases for deleted functions (serialize_hash, deserialize_hash, fnv,
  ethash_sha3_256, ethash_sha3_512 list variant)
- cache/dataset hex comparison uses ndarray.tobytes().hex() directly

bench_before_after.py, bench_hashimoto_compare.py:
- add old/mid/new three-way comparison
- old implementations kept inline for regression reference
- new side imports from current ethash module directly
…umpy

ethash_cy.pyx: typed C loop replacing the 256-iteration FNV parent mixing
in calc_dataset_item. ethash.py auto-imports when built, falls back to pure
Python otherwise. bench_hashimoto_compare.py extended with R3 column.
- ethash.py: rewrite with numpy uint32 arrays (R2); add ETHASH_LIB env var
  to select python/cython/auto at runtime
- ethash_cy.pyx: add mix_parents (R3), cy_calc_dataset_item and
  cy_hashimoto_light with C keccak (R4)
- keccak_tiny.c/h: portable C Keccak implementation for Cython R4
- ethpow.py: use ETHASH_LIB-aware hashimoto_light; simplify check_pow/mine
- setup.py: build Cython extension with keccak_tiny.c
- old_ethash.py: extract original hex-based implementation as reference baseline
- bench_hashimoto_compare.py: merge bench_before_after.py; add R3/R4 sections;
  import old impl from old_ethash.py
- test_ethash.py: use old_ethash as baseline for cython correctness test
- remove bench_before_after.py
@ping-ke
Copy link
Copy Markdown
Contributor Author

ping-ke commented Apr 9, 2026

Add Performance improvements and related tests/bench. See #976 for more infor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants